AITopics | generation loss

Collaborating Authors

generation loss

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploring Training and Inference Scaling Laws in Generative Retrieval

Cai, Hongru, Li, Yongqi, Yuan, Ruifeng, Wang, Wenjie, Zhang, Zhen, Li, Wenjie, Chua, Tat-Seng

arXiv.org Artificial IntelligenceJun-10-2025

Generative retrieval reformulates retrieval as an autoregressive generation task, where large language models (LLMs) generate target documents directly from a query. As a novel paradigm, the mechanisms that underpin its performance and scalability remain largely unexplored. We systematically investigate training and inference scaling laws in generative retrieval, exploring how model size, training data scale, and inference-time compute jointly influence performance. We propose a novel evaluation metric inspired by contrastive entropy and generation loss, providing a continuous performance signal that enables robust comparisons across diverse generative retrieval methods. Our experiments show that n-gram-based methods align strongly with training and inference scaling laws. We find that increasing model size, training data scale, and inference-time compute all contribute to improved performance, highlighting the complementary roles of these factors in enhancing generative retrieval. Across these settings, LLaMA models consistently outperform T5 models, suggesting a particular advantage for larger decoder-only models in generative retrieval. Our findings underscore that model sizes, data availability, and inference computation interact to unlock the full potential of generative retrieval, offering new insights for designing and optimizing future systems.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.18941

Country:

Asia > China (0.29)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Learning to Rank in Generative Retrieval

Li, Yongqi, Yang, Nan, Wang, Liang, Wei, Furu, Li, Wenjie

arXiv.org Artificial IntelligenceDec-16-2023

Generative retrieval stands out as a promising new paradigm in text retrieval that aims to generate identifier strings of relevant passages as the retrieval target. This generative paradigm taps into powerful generative language models, distinct from traditional sparse or dense retrieval methods. However, only learning to generate is insufficient for generative retrieval. Generative retrieval learns to generate identifiers of relevant passages as an intermediate goal and then converts predicted identifiers into the final passage rank list. The disconnect between the learning objective of autoregressive models and the desired passage ranking target leads to a learning gap. To bridge this gap, we propose a learning-to-rank framework for generative retrieval, dubbed LTRGR. LTRGR enables generative retrieval to learn to rank passages directly, optimizing the autoregressive model toward the final passage ranking target via a rank loss. This framework only requires an additional learning-to-rank training phase to enhance current generative retrieval systems and does not add any burden to the inference stage. We conducted experiments on three public benchmarks, and the results demonstrate that LTRGR achieves state-of-the-art performance among generative retrieval methods. The code and checkpoints are released at https://github.com/liyongqi67/LTRGR.

generative retrieval, identifier, retrieval, (16 more...)

arXiv.org Artificial Intelligence

2306.15222

Country:

North America > Canada (0.07)
Asia > China > Hong Kong (0.04)
Africa > Eswatini > Manzini > Manzini (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

TensorGPT: Efficient Compression of the Embedding Layer in LLMs based on the Tensor-Train Decomposition

Xu, Mingxue, Xu, Yao Lei, Mandic, Danilo P.

arXiv.org Artificial IntelligenceJul-2-2023

High-dimensional token embeddings underpin Large Language Models (LLMs), as they can capture subtle semantic information and significantly enhance the modelling of complex language patterns. However, the associated high dimensionality also introduces considerable model parameters, and a prohibitively high model storage. To address this issue, this work proposes an approach based on the Tensor-Train Decomposition (TTD), where each token embedding is treated as a Matrix Product State (MPS) that can be efficiently computed in a distributed manner. The experimental results on GPT-2 demonstrate that, through our approach, the embedding layer can be compressed by a factor of up to 38.40 times, and when the compression factor is 3.31 times, even produced a better performance than the original GPT-2 model.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2307.00526

Country:

Europe > United Kingdom > England > Greater London > London (0.05)
Asia (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

CoBIT: A Contrastive Bi-directional Image-Text Generation Model

You, Haoxuan, Guo, Mandy, Wang, Zhecan, Chang, Kai-Wei, Baldridge, Jason, Yu, Jiahui

arXiv.org Artificial IntelligenceMar-23-2023

The field of vision and language has witnessed a proliferation of pre-trained foundation models. Most existing methods are independently pre-trained with contrastive objective like CLIP, image-to-text generative objective like PaLI, or text-to-image generative objective like Parti. However, the three objectives can be pre-trained on the same data, image-text pairs, and intuitively they complement each other as contrasting provides global alignment capacity and generation grants fine-grained understanding. In this work, we present a Contrastive Bi-directional Image-Text generation model (CoBIT), which attempts to unify the three pre-training objectives in one framework. Specifically, CoBIT employs a novel unicoder-decoder structure, consisting of an image unicoder, a text unicoder and a cross-modal decoder. The image/text unicoders can switch between encoding and decoding in different tasks, enabling flexibility and shared knowledge that benefits both image-to-text and text-to-image generations. CoBIT achieves superior performance in image understanding, image-text understanding (Retrieval, Captioning, VQA, SNLI-VE) and text-based content creation, particularly in zero-shot scenarios. For instance, 82.7% in zero-shot ImageNet classification, 9.37 FID score in zero-shot text-to-image generation and 44.8 CIDEr in zero-shot captioning.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2303.13455

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > United States > Alaska (0.04)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Machine Learning Framework for Event Identification via Modal Analysis of PMU Data

Bazargani, Nima T., Dasarathy, Gautam, Sankar, Lalitha, Kosut, Oliver

arXiv.org Artificial IntelligenceOct-3-2022

Power systems are prone to a variety of events (e.g. line trips and generation loss) and real-time identification of such events is crucial in terms of situational awareness, reliability, and security. Using measurements from multiple synchrophasors, i.e., phasor measurement units (PMUs), we propose to identify events by extracting features based on modal dynamics. We combine such traditional physics-based feature extraction methods with machine learning to distinguish different event types. Including all measurement channels at each PMU allows exploiting diverse features but also requires learning classification models over a high-dimensional space. To address this issue, various feature selection methods are implemented to choose the best subset of features. Using the obtained subset of features, we investigate the performance of two well-known classification models, namely, logistic regression (LR) and support vector machines (SVM) to identify generation loss and line trip events in two datasets. The first dataset is obtained from simulated generation loss and line trip events in the Texas 2000-bus synthetic grid. The second is a proprietary dataset with labeled events obtained from a large utility in the USA involving measurements from nearly 500 PMUs. Our results indicate that the proposed framework is promising for identifying the two types of events.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TPWRS.2022.3212323

2202.06836

Country:

Europe (0.04)
North America > United States > Arizona > Maricopa County > Tempe (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(7 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Energy > Power Industry (0.89)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)

Add feedback

Fine-grained Contrastive Learning for Definition Generation

Zhang, Hengyuan, Li, Dawei, Yang, Shiping, Li, Yanran

arXiv.org Artificial IntelligenceOct-2-2022

Recently, pre-trained transformer-based models have achieved great success in the task of definition generation (DG). However, previous encoder-decoder models lack effective representation learning to contain full semantic components of the given word, which leads to generating under-specific definitions. To address this problem, we propose a novel contrastive learning method, encouraging the model to capture more detailed semantic representations from the definition sequence encoding. According to both automatic and manual evaluation, the experimental results on three mainstream benchmarks demonstrate that the proposed method could generate more specific and high-quality definitions compared with several state-of-the-art models.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.00543

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)
(4 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Cooper FX Arcades review: Plumbing the depths of lo-fi guitar effects

EngadgetOct-7-2020, 12:00:43 GMT

Let's get one thing out of the way right up front: Yes, the main conceit of the $329 Cooper FX Arcades is a little gimmicky. It's a guitar pedal into which you stick cards to apply different effects, kinda like a game console. But while the somewhat novel approach to building a multi-effects unit may have helped Arcades garner attention, this pedal is no mere gimmick. A post shared by Tom Majeski (@cooper.fx) Tom Majeski of Cooper FX is not the first person to take this approach. Line 6 had its ToneCore line of pedals in the mid'aughts, Elta had the Console and TipTop Audio sells the Z-DSP. But Z-DSP is a eurorack module, not a guitar pedal.

artificial intelligence, generation loss, pedal, (12 more...)

Engadget

Genre: Overview (0.34)

Technology: Information Technology > Artificial Intelligence (0.34)

Add feedback

Adversarial Partial Multi-Label Learning

Yan, Yan, Guo, Yuhong

arXiv.org Machine LearningSep-14-2019

Partial multi-label learning (PML), which tackles the problem of learning multi-label prediction models from instances with overcomplete noisy annotations, has recently started gaining attention from the research community. In this paper, we propose a novel adversarial learning model, PML-GAN, under a generalized encoder-decoder framework for partial multi-label learning. The PML-GAN model uses a disambiguation network to identify noisy labels and uses a multi-label prediction network to map the training instances to the disambiguated label vectors, while deploying a generative adversarial network as an inverse mapping from label vectors to data samples in the input feature space. The learning of the overall model corresponds to a minimax adversarial game, which enhances the correspondence of input features with the output labels. Extensive experiments are conducted on multiple datasets, while the proposed model demonstrates the state-of-the-art performance for partial multi-label learning.

artificial intelligence, machine learning, multi-label learning, (15 more...)

arXiv.org Machine Learning

1909.06717

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback